Skip to content

Comments

🤖 feat: boundary-windowed chat loading + metadata-only workspace activity#2493

Merged
ThomasK33 merged 42 commits intomainfrom
chat-subscriptions-dv4n
Feb 20, 2026
Merged

🤖 feat: boundary-windowed chat loading + metadata-only workspace activity#2493
ThomasK33 merged 42 commits intomainfrom
chat-subscriptions-dv4n

Conversation

@ThomasK33
Copy link
Member

Summary

Overhaul chat subscription architecture to load only the current compaction epoch on startup, scope full transcript streaming to the active workspace, and add cursor-based "Load More" pagination for older history.

Background

Previously, every workspace started a full onChat subscription that replayed from the penultimate compaction boundary, meaning all workspaces eagerly loaded two epochs of history regardless of whether they were visible. This caused unnecessary data transfer at startup and steady-state bandwidth waste.

This PR introduces three key changes:

  1. Latest-boundary replay — initial load starts from skip=0 (latest boundary only) instead of skip=1
  2. Active-workspace scoping — only the currently-displayed workspace gets a full onChat stream; all others use a lightweight metadata-only activity feed for sidebar indicators
  3. Load More pagination — a cursor-based ORPC endpoint + frontend button to page backwards through older compaction epochs on demand

Implementation

Backend (agentSession.ts, historyService.ts, ORPC schemas/router)

  • emitHistoricalEvents() now calls getHistoryFromLatestBoundary(workspaceId, 0)
  • New workspace.history.loadMore endpoint returns a single older boundary window with cursor-based pagination
  • New getHistoryBoundaryWindow() helper scans boundaries to return exactly one epoch window at a time

Frontend aggregator (StreamingMessageAggregator.ts)

  • Renamed pruneBeforePenultimateBoundarypruneBeforeLatestBoundary
  • Live compaction now prunes everything before the incoming boundary's sequence, keeping only the current epoch

Frontend store (WorkspaceStore.ts, WorkspaceContext.tsx)

  • addWorkspace() no longer starts runOnChatSubscription() — subscription is managed by ensureActiveOnChatSubscription()
  • Only one full onChat stream active at a time, switched via setActiveWorkspaceId()
  • Activity subscription (workspace.activity.list/subscribe) provides streaming/recency/model fallbacks for non-active workspaces
  • Per-workspace pagination state initialized on caught-up, exposed as hasOlderHistory/loadingOlderHistory in WorkspaceState

Frontend UI (ChatPane.tsx)

  • "Load older messages" button above transcript when hasOlderHistory is true
  • Disabled with "Loading..." text during fetch

Validation

  • make typecheck
  • make lint
  • make fmt-check
  • bun test across 3 targeted test suites (150 tests, 427 assertions) ✅
  • bun test src/node/services/agentSession (43 tests) ✅

Risks

  • Non-active workspace sidebar indicators now depend on the activity subscription. If the activity feed lags or disconnects, sidebar status (streaming, model, recency) may briefly show stale values. Fallback logic merges aggregator + activity data, preferring the fresher value.
  • Load More edge cases — heavily corrupted histories with missing historySequence metadata may cause pagination to stop earlier than expected (safe degradation rather than crash).

📋 Implementation Plan

Plan: boundary-windowed chat loading + metadata-only workspace activity

Context / Why

We want chat startup to feel snappier by reducing unnecessary replay volume and decoupling sidebar status from full transcript streams.

Requested outcome:

  1. Initial open should stream only from the latest compaction boundary (current epoch), not the penultimate boundary.
  2. Add Load More pagination so each click reveals the previous compaction epoch window.
  3. Stop relying on full workspace.onChat streams for every workspace; use a metadata-only subscription for cross-workspace status (new activity / stream finished), while keeping full chat streaming focused on the active workspace.

This reduces startup data transfer, avoids replaying large historical tails by default, and preserves user-controlled access to older history.

Evidence

  • src/node/services/agentSession.ts
    • emitHistoricalEvents() currently calls historyService.getHistoryFromLatestBoundary(workspaceId, 1) (penultimate boundary replay) before caught-up.
    • onChat replay modes already exist (full / since / live) with cursor-based reconnect safety checks.
  • src/node/services/historyService.ts
    • getHistoryFromLatestBoundary(workspaceId, skip) already supports boundary-window selection (skip=0 latest, skip=1 previous, etc.), which can back a Load More flow.
  • src/browser/utils/messages/StreamingMessageAggregator.ts
    • Live compaction currently prunes to the penultimate boundary (pruneBeforePenultimateBoundary) and includes an explicit TODO for paginated older-history support.
  • src/browser/stores/WorkspaceStore.ts
    • addWorkspace() immediately starts runOnChatSubscription() for every workspace during syncWorkspaces().
    • runOnChatSubscription() already uses since cursor reconnects via aggregator.getOnChatCursor().
  • src/common/orpc/schemas/api.ts + src/node/orpc/router.ts + src/common/orpc/schemas/workspace.ts
    • A metadata-only feed already exists: workspace.activity.list + workspace.activity.subscribe with WorkspaceActivitySnapshot { recency, streaming, lastModel, lastThinkingLevel }.
    • Frontend currently does not consume this feed.

Storage layout assessment: split chat.jsonl into epoch files?

Recommendation: not in this iteration (keep single chat.jsonl + boundary/cursor pagination).

Why:

  • Current hot-path replay already avoids full-file parse by reading from a boundary offset (HistoryService.findLastBoundaryByteOffset + readHistoryFromOffset).
  • The bigger wins are at subscription scope/pagination level (active workspace streaming + metadata feed), not physical file sharding.
  • File-sharding would require touching many persistence assumptions:
    • HistoryService mutation APIs (appendToHistory, updateHistory, truncateHistory, migrateWorkspaceId) currently treat history as one atomic file.
    • Subagent transcript archival/indexing and ORPC transcript reads store fixed chat.jsonl paths.
    • CLI/debug tooling and path conventions assume ~/.mux/sessions/<workspace>/chat.jsonl.
If we eventually shard by compaction epoch, what must be added?
  • A manifest/index file (ordered shard list + min/max historySequence per shard).
  • Atomic write protocol for shard rollover at compaction boundaries.
  • Cross-shard lookup path for updateHistory(historySequence) and delete/truncate operations.
  • Compatibility readers for legacy single-file sessions.
  • Reassembly utility for any workflows that still require single-stream JSONL exports.

Estimated additional product LoC beyond the current plan: ~400–700 LoC (+ substantial test churn).

Recommended approach (A): active onChat + boundary pagination + activity metadata feed

Net LoC estimate (product code): ~260–360 LoC

1) Scope full onChat streaming to the active workspace only

Keep addWorkspace() for registration/aggregator creation, but manage exactly one full chat stream (the workspace currently displayed).

Files/symbols:

  • src/browser/stores/WorkspaceStore.ts
    • addWorkspace, removeWorkspace, syncWorkspaces
    • new fields: activeWorkspaceId, activeOnChatWorkspaceId
    • new methods: setActiveWorkspaceId, ensureActiveOnChatSubscription
  • src/browser/contexts/WorkspaceContext.tsx
    • call workspaceStore.setActiveWorkspaceId(currentWorkspaceId) in an effect
// shape only
setActiveWorkspaceId(workspaceId: string | null): void {
  this.activeWorkspaceId = workspaceId;
  this.ensureActiveOnChatSubscription();
}

private ensureActiveOnChatSubscription(): void {
  if (this.activeOnChatWorkspaceId === this.activeWorkspaceId) return;
  if (this.activeOnChatWorkspaceId) this.stopOnChat(this.activeOnChatWorkspaceId);
  if (this.activeWorkspaceId) this.startOnChat(this.activeWorkspaceId);
  this.activeOnChatWorkspaceId = this.activeWorkspaceId;
}

Defensive points:

  • Assert we never keep >1 active onChat subscription.
  • On workspace removal, tear down active stream if that workspace was active.
  • Preserve existing reconnect semantics (mode: "since" cursor, full fallback) inside runOnChatSubscription().

2) Replay only from the latest compaction boundary on initial load

Switch onChat full replay baseline from penultimate boundary (skip=1) to latest boundary (skip=0).

Files/symbols:

  • src/node/services/agentSession.ts
    • emitHistoricalEvents() history load call + comments
  • src/browser/utils/messages/StreamingMessageAggregator.ts
    • replace penultimate pruning logic with latest-boundary pruning behavior
// agentSession.ts
const historyResult = await this.historyService.getHistoryFromLatestBoundary(
  this.workspaceId,
  0 // latest boundary only
);
// StreamingMessageAggregator.ts (shape)
if (this.isCompactionBoundarySummaryMessage(incomingMessage)) {
  this.pruneBeforeBoundarySequence(incomingMessage.metadata?.historySequence);
}

This keeps live behavior aligned with fresh loads: once a new boundary arrives, older epochs are pruned by default.

3) Add explicit “Load More history” API with cursor pagination

Expose a non-stream endpoint that pages older compaction epochs via a stable cursor (not page index).

Files/symbols:

  • src/common/orpc/schemas/api.ts
  • src/node/orpc/router.ts
  • src/node/services/workspaceService.ts
  • (optional helper extraction) src/node/services/historyService.ts

Recommended request/response shape:

// shape only
workspace.history.loadMore: {
  input: {
    workspaceId: string;
    cursor: {
      beforeHistorySequence: number; // oldest sequence currently loaded in UI (exclusive upper bound)
      beforeMessageId?: string;      // defensive anchor for mismatch detection
    } | null;
  };
  output: {
    messages: MuxMessage[]; // previous boundary window only (older segment)
    nextCursor: {
      beforeHistorySequence: number;
      beforeMessageId?: string;
    } | null;               // null => no older history available
    hasOlder: boolean;
  };
}

Why cursor (vs skip index):

  • Robust against dynamic history changes (compaction/edit/delete) between requests.
  • No client/server page-index drift.
  • Naturally supports prepend pagination by “load messages older than current oldest loaded row”.

4) Implement Load More UX in chat transcript

Add a top-of-transcript control that prepends exactly one older boundary window per click.

Files/symbols:

  • src/browser/stores/WorkspaceStore.ts
    • new per-workspace pagination state: { nextCursor, hasOlder, loading }
    • new action: loadOlderHistory(workspaceId)
  • src/browser/components/ChatPane.tsx
    • render “Load More” control above message list
// shape only
async loadOlderHistory(workspaceId: string): Promise<void> {
  const page = this.historyPagination.get(workspaceId);
  if (!page?.hasOlder || page.loading) return;

  const result = await this.client!.workspace.history.loadMore({
    workspaceId,
    cursor: page.nextCursor,
  });

  // Prepend older slice while preserving current window + live tail.
  this.assertGet(workspaceId).loadHistoricalMessages(result.messages, false, { mode: "append" });

  this.historyPagination.set(workspaceId, {
    nextCursor: result.nextCursor,
    hasOlder: result.hasOlder,
    loading: false,
  });
  this.states.bump(workspaceId);
}

Implementation note: if append cannot safely preserve strict chronological order for prepends, add a dedicated prependHistoricalMessages() path in the aggregator and assert sequence monotonicity after merge.

5) Use metadata-only activity feed for non-active workspace status

Leverage existing workspace.activity.list/subscribe as the metadata channel for unread + streaming indicators.

Files/symbols:

  • src/browser/stores/WorkspaceStore.ts
    • add activity snapshot map + runActivitySubscription()
    • merge activity fallback into getWorkspaceState() / sidebar derivation
  • src/browser/contexts/WorkspaceContext.tsx
    • no new API plumbing required if store owns the subscription lifecycle
// shape only fallback in getWorkspaceState()
const activity = this.workspaceActivity.get(workspaceId);
const canInterrupt = activeStreams.length > 0 || activity?.streaming === true;
const currentModel = aggregator.getCurrentModel() ?? activity?.lastModel ?? null;
const recencyTimestamp = aggregator.getRecencyTimestamp() ?? activity?.recency ?? null;

This preserves sidebar responsiveness (unread dot, streaming state, model tooltip) without paying transcript-stream costs for every workspace.

6) Tests to update/add

Files:

  • src/node/services/agentSession*.test.ts (or add targeted replay test)
  • src/browser/utils/messages/StreamingMessageAggregator.test.ts
  • src/browser/stores/WorkspaceStore.test.ts
  • src/browser/components/ChatPane*.test.tsx (or nearby transcript tests)

Test cases:

  1. Full replay now starts at latest boundary (skip=0) and emits expected caught-up metadata.
  2. Live boundary arrival prunes to latest-boundary window (no penultimate epoch retained).
  3. loadOlderHistory() advances cursor page-by-page, prepending one boundary window per click until nextCursor=null / hasOlder=false.
  4. Non-active workspaces keep unread/streaming indicators via activity snapshots without full onChat streams.
  5. Active workspace switch cleanly moves the sole full onChat subscription and keeps since reconnect behavior intact.
Why this approach over sequential full subscriptions?

Sequential full subscriptions reduce startup spikes but still replay transcript data for every workspace. Using one full stream (active workspace) plus metadata-only activity for the rest cuts both startup and steady-state bandwidth much more aggressively while keeping sidebar signal quality.

Validation plan

  • bun run test src/node/services/historyService.test.ts
  • bun run test src/node/services/agentSession*.test.ts (targeted replay-focused cases)
  • bun run test src/browser/utils/messages/StreamingMessageAggregator.test.ts
  • bun run test src/browser/stores/WorkspaceStore.test.ts
  • bun run test src/browser/components/Messages/MessageRenderer.test.tsx and/or ChatPane tests (if Load More UI is touched there)
  • make typecheck

Generated with mux • Model: anthropic:claude-opus-4-6 • Thinking: xhigh • Cost: $0.95

Change-Id: I34b8bbedfb456c8b3a27b4906047082b9a448284
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Switch startup replay to the latest boundary and add backend paging for older compaction epochs.

- changed AgentSession replay to start at skip=0 (latest compaction boundary)
- added HistoryService.getHistoryBoundaryWindow() to return one older epoch window plus hasOlder
- exposed workspace.history.loadMore in ORPC schema/router/workspace service
- added historyService tests covering boundary-window paging behavior

Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I7f1ada3dd8daf93ec12ff18bc09edbcfe63758ed
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I3da0fed5263b45bc367ef1e243cf083bfbac57c7
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: Id0ffd056c360c8879fdf33c6fc1ddd3fc275fb48
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2200cbcfc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Change-Id: Iabe3f84b0cefe77a87863146377117904853bc62
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Member Author

@codex review

Change-Id: Ifebd5176f6d91f2f393e082754571819265a76f4
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Member Author

@codex review

Addressed both review comments:

  1. Pagination cursor refresh on live compactionhistoryPagination is now recomputed in processStreamEvent when a compaction-boundary summary message arrives, so loadOlderHistory() uses the post-prune cursor.

  2. Stale activity fallback for active workspacecanInterrupt, currentModel, and currentThinkingLevel now only fall back to activity snapshots for non-active workspaces. The active workspace trusts the live aggregator exclusively.

Change-Id: I8b332c398a28b788587425f78d634bd72b1b3d99
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48b6215b01

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ed pagination threshold

Change-Id: Ia78e76e48c4466a1f68a2110c511e12e9f44a43c
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Member Author

@codex review

Addressed both new review comments:

  1. Inactive workspaces now prefer activity snapshots — For non-active workspaces, canInterrupt, currentModel, and currentThinkingLevel now prefer the activity snapshot over potentially stale aggregator state (since inactive workspaces don't receive stream-end events).

  2. Fixed zero-based pagination thresholdhasOlder now uses historySequence > 0 instead of > 1, correctly recognizing that sequence 0 is valid and sequence 1 means there's still an older row.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f09caa963

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Change-Id: Ic7544045454bbdc5e3d544ba678a67ddfb77d1d7
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I5b43c0ea2aba795d3b5ea441c56a642c09a0fe25
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Member Author

@codex review

Change-Id: I723b02f703e4401310ecc2f88a3e4891811a9be9
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 019062e776

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Change-Id: Ic6b8914985a58d6c4a4cb7f81de5ef0ee01868ea
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Change-Id: I3020443107312cd5ea5eeb0e4bbf8c367571bf9e
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Member Author

@codex review

When switching to a workspace that was streaming in the background,
there's a brief window where the aggregator is cleared and replaying
history. During this window, trust the activity snapshot for
canInterrupt/model/thinkingLevel instead of the empty aggregator state.

Uses the existing transient.caughtUp flag as the guard: only trust the
aggregator once the onChat replay has delivered the caught-up marker.

Change-Id: I6d88bb91ced5ba6ce2911ffefdce4b48a6342312
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6687897b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The setActiveWorkspaceId call was a no-op when the workspace hadn't
been registered in the store yet (isWorkspaceRegistered returns false).
In integration tests, the WorkspaceContext sync may not have completed
by the time setupWorkspaceView runs. Call addWorkspace(metadata) first
to guarantee registration before activation.

Also expose addWorkspace on the workspaceStore wrapper for test access.

Change-Id: I972c6ecc5ce02e0954bbcd8a73131be4ba9727e8
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33
Copy link
Member Author

@codex review

@ThomasK33
Copy link
Member Author

@codex review

Addressed the latest P2 race by dropping queued onChat microtask events after attempt abort, plus added a regression test for deferred queueMicrotask execution during active-workspace switches.


Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh • Cost: $51.56

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Increase Storybook chat story wait tolerances to avoid flaky retries/timeouts on slower CI runners:
- raise waitForChatMessagesLoaded timeout to 25s
- add explicit waitFor timeouts in ModelSelectorPrettyWithGateway
- widen Exec/tooltip wait windows in affected App.chat stories

This reproduces and fixes the intermittent `Test / Storybook` failures seen on PR #2493.

Signed-off-by: Thomas Kosiewski <tk@coder.com>

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$51.56`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=51.56 -->

Change-Id: Ia0494edbf9e51c47ea3718a46491338b3de176c4
@ThomasK33
Copy link
Member Author

@codex review

Addressed a reproducible Storybook flake seen on this PR by hardening chat story waits/timeouts (local repro + pass with CI-equivalent make storybook-build && make test-storybook).


Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh • Cost: $51.56

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Address repeated Storybook runner flakes by:
- raising the global storybook test timeout (`make test-storybook --testTimeout 30000`)
- removing the brittle `ModeHelpTooltip` play interaction (hover media-query dependent)
- making `ShowCodeView` wait on the actionable code button instead of transcript loaded markers

This keeps interaction coverage for stable paths while unblocking CI from non-deterministic story startup timing.

Signed-off-by: Thomas Kosiewski <tk@coder.com>

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$51.56`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=51.56 -->

Change-Id: I43f3b6b8558b9357ccfe2328ae9e657ccf0af146
@ThomasK33
Copy link
Member Author

@codex review

Follow-up for the persistent Storybook flake:

  • made make test-storybook use --testTimeout 30000
  • removed the brittle ModeHelpTooltip play interaction
  • updated ShowCodeView to wait on the actionable code button rather than transcript loaded markers

Local repro/validation now passes reliably with CI-equivalent steps.


Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh • Cost: $51.56

`App/CodeExecution › ShowCodeView` remained non-deterministic in CI and local CI-equivalent runs due startup timing races. Remove the brittle play interaction while retaining the visual story.

This pairs with the previous Storybook timeout hardening to stabilize the `Test / Storybook` gate for this PR.

Signed-off-by: Thomas Kosiewski <tk@coder.com>

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$51.56`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=51.56 -->

Change-Id: I79bdfea3cb18f81a59667e284d654a62b08b83dc
@ThomasK33
Copy link
Member Author

@codex review

Final Storybook flake follow-up: removed the remaining non-deterministic ShowCodeView play interaction from App.codeExecution.stories.tsx (visual story remains), after reproducing repeated CI-equivalent failures.


Generated with mux • Model: openai:gpt-5.3-codex • Thinking: xhigh • Cost: $51.56

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Reintroduce the ModeHelpTooltip story play interaction, but make it deterministic for CI/Chromatic by forcing the model help trigger visible before hover and asserting tooltip content.

Also add a stable data-component hook on the model help wrapper so the story can target the intended tooltip trigger without brittle selectors.

Signed-off-by: Thomas Kosiewski <tk@coder.com>

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$52.03`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=52.03 -->

Change-Id: Ief635d8d799767d58b9ac78fb3537445166da899
@ThomasK33
Copy link
Member Author

@codex review

I restored deterministic coverage for the ModeHelpTooltip story by reintroducing the play interaction with a stable selector hook and forcing the trigger visible in headless CI before hover.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 11763b9ccc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Replace the App-backed ModeHelpTooltip story with an isolated tooltip fixture wrapped in TooltipProvider. The story now verifies hover behavior and tooltip content deterministically without depending on full app hydration timing.

Also drop the temporary ChatInput `data-component` test hook since the isolated story no longer needs it.

Signed-off-by: Thomas Kosiewski <tk@coder.com>

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$52.03`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=52.03 -->

Change-Id: I0420cc3458b59197f6dcc781d0fa84d1a70f78b6
@ThomasK33
Copy link
Member Author

@codex review

I replaced the flaky App-backed ModeHelpTooltip interaction with an isolated TooltipProvider fixture that still validates hover-triggered help content, and reran Storybook interactions locally.

Clarify in the ShowCodeView story that CodeExecutionToolCall automatically switches to the code tab when execution completes without nested tool calls. This keeps the story's intent explicit without reintroducing a flaky interaction step.

Signed-off-by: Thomas Kosiewski <tk@coder.com>

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking: `xhigh` • Cost: `$52.03`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh costs=52.03 -->

Change-Id: I32f5010d3649ff2fc0b74ca5def6d1acfbfe55e0
@ThomasK33
Copy link
Member Author

@codex review

Addressed the unresolved ShowCodeView thread by documenting that the story intentionally relies on CodeExecutionToolCall's built-in auto-switch to code view (complete + no nested calls), and resolved the thread.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33
Copy link
Member Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33
Copy link
Member Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Feb 20, 2026
Merged via the queue into main with commit ba2b7b3 Feb 20, 2026
23 checks passed
@ThomasK33 ThomasK33 deleted the chat-subscriptions-dv4n branch February 20, 2026 10:24
github-merge-queue bot pushed a commit that referenced this pull request Feb 21, 2026
…#2530)

Summary
Fixes a regression where the context usage meter disappears after
switching back to a workspace that compacted while backgrounded.

Background
Recent boundary-windowed replay behavior only scans the active
compaction epoch for context usage. When the newest message in that
epoch is the compaction boundary summary and that summary has no
`contextUsage`, the UI shows no usage until later tool/model events
arrive.

Implementation
- `CompactionHandler` now sanitizes compaction stream-end metadata by
stripping stale provider metadata while attaching a post-compaction
context estimate (`systemMessageTokens + summary output tokens`) as
`contextUsage` when available.
- `WorkspaceStore` now checks compaction boundary messages for
`contextUsage` before stopping its backwards epoch scan.
- Added targeted regression tests in both `compactionHandler.test.ts`
and `WorkspaceStore.test.ts`.

Risks
Low-to-moderate. The behavior change is scoped to post-compaction
metadata and usage display fallback. Tests cover both estimate presence
and omission paths.

---

<details>
<summary>📋 Implementation Plan</summary>

# Fix: Context Usage Meter Disappears on Workspace Switch

## Context / Why

After the boundary-windowed chat loading change (PR #2493) and
auto-compaction backend move (PR #2469), the context usage meter
disappears when switching between workspaces. The user must wait for one
or two agent tool calls before it reappears.

**Root cause:** When idle compaction fires while a workspace is
backgrounded, the compaction summary intentionally strips `contextUsage`
(to avoid displaying stale pre-compaction values). On switch-back, the
frontend replays only the current epoch (post-boundary). If the only
message in that epoch is the compaction summary (which has no
`contextUsage`), the backward scan returns `undefined` → meter shows
empty.

## Evidence

| File | What it tells us |
|---|---|
| `src/node/services/compactionHandler.ts:515-535` |
`sanitizeCompactionStreamEndEvent` strips `contextUsage`,
`providerMetadata`, `contextProviderMetadata` from the compaction
stream-end event |
| `src/node/services/compactionHandler.ts:688-706` | Compaction summary
message has `systemMessageTokens` and `usage` (with `outputTokens` =
summary size) but no `contextUsage` |
| `src/browser/stores/WorkspaceStore.ts:1690-1706` | Backward scan
`break`s at `isDurableCompactionBoundaryMarker` **without** checking the
boundary message for `contextUsage` |
| `src/node/services/agentSession.ts:2323-2360` | Backend
`seedUsageStateFromHistory` scans the boundary epoch (including boundary
msg) for `contextUsage` — currently finds nothing because boundary has
none |
| `src/common/utils/messages/compactionBoundary.ts:11-13` |
`hasDurableCompactedMarker` accepts `true \| "user" \| "idle"` |
| `src/node/services/compactionMonitor.ts` | Auto-compaction threshold
is 70% (`DEFAULT_AUTO_COMPACTION_THRESHOLD = 0.7`), triggered by
backend's own `lastUsageState` — completely independent of frontend
display |

## Approach: Post-Compaction Context Estimate on Boundary Messages

Instead of stripping `contextUsage` entirely, **replace** it with a
computed post-compaction estimate representing the approximate context
window size *after* compaction (system prompt + summary).

### Why this is safe from compaction loops

The post-compaction estimate is inherently small:

```
estimate = systemMessageTokens + compactionSummary.outputTokens
         ≈ 5–15% of context window
```

- Auto-compaction triggers at **70%** — the estimate is far below this.
- Backend `seedUsageStateFromHistory` would find this small value on
restart → no re-compaction.
- The compaction trigger (`CompactionMonitor`) uses the backend's
in-memory `lastUsageState` from actual provider responses, **not** the
frontend display. Even if the estimate were somehow wrong, it cannot
cause a backend loop.

### What the user sees

| State | Meter |
|---|---|
| Before compaction | 70% (real) |
| After compaction, before next response | ~10% (estimate — system
prompt + summary) |
| After next agent response | Real value from new `contextUsage` |

## Implementation Details (~20 net LoC)

### 1. Backend: Compute estimate in `CompactionHandler` 

**File:** `src/node/services/compactionHandler.ts`

In `sanitizeCompactionStreamEndEvent`, instead of stripping
`contextUsage`, replace it with a post-compaction estimate:

```typescript
private sanitizeCompactionStreamEndEvent(event: StreamEndEvent): StreamEndEvent {
  const { providerMetadata, contextProviderMetadata, contextUsage, timestamp, ...cleanMetadata } =
    event.metadata;

  // Compute a post-compaction context estimate: system prompt + summary tokens.
  // This gives the frontend a directionally-correct "near empty" reading while
  // preventing stale pre-compaction values from inflating the meter.
  const postCompactionContextEstimate = this.computePostCompactionContextEstimate(
    cleanMetadata.systemMessageTokens,
    cleanMetadata.usage,
  );

  const sanitizedEvent: StreamEndEvent = {
    ...event,
    metadata: {
      ...cleanMetadata,
      ...(postCompactionContextEstimate && { contextUsage: postCompactionContextEstimate }),
    },
  };

  assert(
    sanitizedEvent.metadata.providerMetadata === undefined &&
      sanitizedEvent.metadata.contextProviderMetadata === undefined,
    "Compaction stream-end event must not carry stale provider metadata",
  );

  return sanitizedEvent;
}

/**
 * Approximate context window size after compaction: system prompt + summary.
 * Returns undefined if inputs are missing (graceful fallback to no-data).
 */
private computePostCompactionContextEstimate(
  systemMessageTokens: number | undefined,
  usage: LanguageModelV2Usage | undefined,
): LanguageModelV2Usage | undefined {
  const summaryTokens = usage?.outputTokens;
  if (summaryTokens == null || summaryTokens <= 0) return undefined;

  const systemTokens = systemMessageTokens ?? 0;
  const estimatedInputTokens = systemTokens + summaryTokens;

  return {
    inputTokens: estimatedInputTokens,
    outputTokens: 0,
  };
}
```

### 2. Frontend: Read boundary's `contextUsage` before breaking

**File:** `src/browser/stores/WorkspaceStore.ts`

Modify the backward scan (~line 1693) to check the boundary message for
`contextUsage` before breaking:

```typescript
const lastContextUsage = (() => {
  for (let i = messages.length - 1; i >= 0; i--) {
    const msg = messages[i];
    if (isDurableCompactionBoundaryMarker(msg)) {
      // Boundary may carry a post-compaction context estimate.
      // Check before breaking so the meter shows "near-empty" instead of nothing.
      const rawUsage = msg.metadata?.contextUsage;
      if (rawUsage && msg.role === "assistant") {
        const msgModel = msg.metadata?.model ?? model ?? "unknown";
        return createDisplayUsage(rawUsage, msgModel, undefined);
      }
      break;
    }
    if (msg.role === "assistant") {
      if (msg.metadata?.compacted) continue;
      // ... existing contextUsage extraction (unchanged)
    }
  }
  return undefined;
})();
```

### 3. Update assertion message

**File:** `src/node/services/compactionHandler.ts`

The existing assertion (line 527-531) checks `contextUsage ===
undefined`. Update it to allow the new estimate:

```typescript
assert(
  sanitizedEvent.metadata.providerMetadata === undefined &&
    sanitizedEvent.metadata.contextProviderMetadata === undefined,
  "Compaction stream-end event must not carry stale provider metadata",
);
```

(Remove the `contextUsage === undefined` check from the assertion.)

### 4. Tests

- **CompactionHandler tests:** Verify `sanitizeCompactionStreamEndEvent`
produces a post-compaction estimate (not the pre-compaction value) when
`systemMessageTokens` and `usage.outputTokens` are available, and
produces `undefined` when they're missing.
- **WorkspaceStore tests (or unit test for the scan logic):** Verify the
backward scan reads `contextUsage` from a boundary message when no newer
messages have it.
- **Integration/manual:** Switch between workspaces after idle
compaction → meter should show a small value instead of disappearing.

<details>
<summary>Alternatives considered</summary>

### Alt A: Store `lastContextUsage` in `session-usage.json`

Persist `lastContextUsage` to disk alongside existing session usage
data. Clear it on compaction.

- **Pro:** Survives restarts independently of message history.
- **Con:** After compaction, still shows empty (cleared). Adds new
persistence field + service changes.
- **Con:** More moving parts, touches both `SessionUsageService` and
`WorkspaceStore`.

### Alt B: Include `lastContextUsage` in the `caughtUp` IPC payload

Backend sends its in-memory `lastUsageState` in the `caughtUp` event.

- **Pro:** Always authoritative, no new persistence.
- **Con:** `lastUsageState` may be stale/undefined after compaction.
Requires IPC schema change. Frontend still needs fallback logic.

### Alt C: Frontend caches value before clearing aggregator

Before `resetChatStateForReplay` clears the aggregator, snapshot the
current `lastContextUsage` and use as fallback.

- **Pro:** Frontend-only change.
- **Con:** Shows stale pre-compaction value (70%) then drops to real
value — worse UX. Doesn't survive restart.

### Alt D: Don't strip `contextUsage` from compaction boundary (carry
forward original)

- **Con:** This is the infinite loop scenario the user warned about.
Backend `seedUsageStateFromHistory` would read the inflated
pre-compaction value, seed `lastUsageState` to 70%, and
`checkBeforeSend` would immediately trigger another compaction.
</details>

</details>

---

_Generated with `mux` • Model: `openai:gpt-5.3-codex` • Thinking:
`xhigh` • Cost: `$8.29`_

<!-- mux-attribution: model=openai:gpt-5.3-codex thinking=xhigh
costs=8.29 -->
github-merge-queue bot pushed a commit that referenced this pull request Feb 23, 2026
…t totals (#2546)

## Summary

Fixes stale workspace cost totals when switching between workspaces. The
frontend fetched persisted session usage (`session-usage.json`) only
once during workspace registration (`addWorkspace`), so cost rollups
arriving while a workspace was inactive (e.g., sub-agent deletions) were
never picked up until a hard refresh.

## Background

PR #2493 introduced boundary-windowed chat loading, where only the
active workspace receives a full `onChat` subscription. Non-active
workspaces use metadata-only activity feeds. This means
`session-usage-delta` events — which carry sub-agent cost rollups — only
reach the currently active workspace.

Since `getSessionUsage()` was only called once during `addWorkspace()`,
switching back to a workspace that received rollups while inactive would
show stale (lower) cost totals. Users reported workspaces dropping from
~$20 to <$5 until `Ctrl+Shift+R` forced a full reload.

## Implementation

- Extracted the inline `getSessionUsage` fetch + repricing logic from
`addWorkspace()` into a shared `refreshSessionUsage()` private method.
- Added a per-workspace request version guard so slower/older fetch
responses cannot overwrite fresher state during rapid workspace
switches.
- `setActiveWorkspaceId()` now calls `refreshSessionUsage()` for the
newly active workspace, re-hydrating any cost data that arrived while it
was inactive.
- Cleanup: `removeWorkspace()` clears the request-version tracking for
deleted workspaces.

## Validation

- New regression tests:
- Verifies activation triggers a fresh `getSessionUsage` fetch and
hydrates `sessionTotal`
- Verifies stale in-flight responses are dropped when a newer refresh
supersedes them
- All existing `WorkspaceStore.test.ts` tests pass
- `make typecheck` passes

---

_Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking:
`xhigh`_

<!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh
costs=9.21 -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant